<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Replication Group Life Cycle</title>
    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
    <meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
    <link rel="start" href="index.html" title="Getting Started with Berkeley DB, Java Edition High Availability Applications" />
    <link rel="up" href="introduction.html" title="Chapter 1. Introduction" />
    <link rel="prev" href="datamanagement.html" title="Managing Data Guarantees" />
    <link rel="next" href="progoverview.html" title="Chapter 2. Replication API First Steps" />
  </head>
  <body>
    <div class="navheader">
      <table width="100%" summary="Navigation header">
        <tr>
          <th colspan="3" align="center">Replication Group Life Cycle</th>
        </tr>
        <tr>
          <td width="20%" align="left"><a accesskey="p" href="datamanagement.html">Prev</a> </td>
          <th width="60%" align="center">Chapter 1. Introduction</th>
          <td width="20%" align="right"> <a accesskey="n" href="progoverview.html">Next</a></td>
        </tr>
      </table>
      <hr />
    </div>
    <div class="sect1" lang="en" xml:lang="en">
      <div class="titlepage">
        <div>
          <div>
            <h2 class="title" style="clear: both"><a id="lifecycle"></a>Replication Group Life Cycle</h2>
          </div>
        </div>
      </div>
      <div class="toc">
        <dl>
          <dt>
            <span class="sect2">
              <a href="lifecycle.html#lifecycle-terms">Terminology</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="lifecycle.html#nodestates">Node States</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="lifecycle.html#lifecycle-new">New Replication Group Startup</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="lifecycle.html#lifecycle-established">Subsequent Startups</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="lifecycle.html#lifecycle-nodestartup">Replica Startup</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="lifecycle.html#lifecycle-masterfailover">Master Failover</a>
            </span>
          </dt>
          <dt>
            <span class="sect2">
              <a href="lifecycle.html#twonode">Two Node Groups</a>
            </span>
          </dt>
        </dl>
      </div>
      <p>
              This section describes how your replication group behaves
              over the course of the application's lifetime. Startup is
              described, both for new nodes as well as for existing nodes
              that are restarting. This section also describes Master
              failover.
          </p>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="lifecycle-terms"></a>Terminology</h3>
            </div>
          </div>
        </div>
        <p>
                    Before continuing, it is necessary to define some terms
                    used in this document as they relate to
                    node participation in a replication group.
              </p>
        <div class="itemizedlist">
          <ul type="disc">
            <li>
              <p>
                          Add/Remove
                      </p>
              <p>
                          When we say that a node has been
                          <span class="emphasis"><em>added</em></span> to a replication group,
                          this means that it has become a persistent member of
                          the group. Regardless of whether the node is running
                          or otherwise reachable by the group, once it has been
                          added to the group it remains a member of the group.
                          If the node is an electable node, the group size
                          used during elections, or transaction commit
                          acknowledgements, is increased by one.
                      </p>
              <p>
                          A node that has been added to a replication group
                          remains a member of that group until it is explicitly
                          <span class="emphasis"><em>removed</em></span> from the group. Once a
                          node has been removed from the group, it is no longer a member
                          of the group. If the node that was removed was an
                          electable node, the group size used during
                          elections, or transaction commit
                          acknowledgements, is decreased by one.
                      </p>
            </li>
            <li>
              <p>
                          Join/Leave
                      </p>
              <p>
                          We say that a member has <span class="emphasis"><em>joined</em></span> the
                          replication group when it starts up and begins
                          participating in the group as an active node.
                          Electable nodes join a replication group by
                          successfully opening a <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicatedEnvironment.html" target="_top">ReplicatedEnvironment</a> handle.
                      </p>
              <p>
                          A member, then, <span class="emphasis"><em>leaves</em></span> a
                          replication group by shutting down, or otherwise
                          ceasing to participate in the group as an active node.
                          When operating normally, electable nodes leave a
                          replication group by closing its last
                          <a class="ulink" href="../java/com/sleepycat/je/rep/ReplicatedEnvironment.html" target="_top">ReplicatedEnvironment</a> handle.
                      </p>
              <p>
                          Joining or leaving a group does not change the
                          group size, and so the number of nodes required
                          to hold an election, as well as the number of
                          nodes required to acknowledge transaction
                          commits, does not change.
                      </p>
            </li>
          </ul>
        </div>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="nodestates"></a>Node States</h3>
            </div>
          </div>
        </div>
        <p>
                  Member nodes can be in the following states:
              </p>
        <div class="itemizedlist">
          <ul type="disc">
            <li>
              <p>
                          Master
                      </p>
              <p>
                          When in the Master state, a member node can service read and
                          write requests. At any given time, there can be only one node in the
                          Master state in the replication group.
                      </p>
            </li>
            <li>
              <p>
                          Replica
                      </p>
              <p>
                          Member nodes in the Replica state can only service read
                          requests. The majority of nodes in the replication
                          group should be in the Replica state.
                      </p>
            </li>
            <li>
              <p>
                          Unknown
                      </p>
              <p>
                          The member node is not aware of a Master and is actively
                          trying to discover or elect a Master. A node in this
                          state is constantly striving to transition to the
                          more productive Master or Replica state. 
                      </p>
              <p>
                          A node in the Unknown state can still process read
                          transactions if the node can satisfy its transaction
                          consistency requirements.
                      </p>
            </li>
            <li>
              <p>
                          Detached
                      </p>
              <p>
                          The member node has been shutdown (that is, it has left the
                          group, but it has not been removed from the group
                          — see the previous section). It is still a member of
                          the replication group, but is not an active participant.
                      </p>
            </li>
          </ul>
        </div>
        <p>
                  Note that from time to time this documentation uses the term
                  <span class="emphasis"><em>active node</em></span>. An active node is a member node
                  that is in the Master, Replica or Unknown state. More to the
                  point, an active node is a node that is available to
                  participate in elections.
              </p>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="lifecycle-new"></a>New Replication Group Startup</h3>
            </div>
          </div>
        </div>
        <p>
                  The first time you start up a replication group, the group
                  exists (for at least a small time) as a group of size one. At
                  this time, the single node belonging to the group becomes the
                  Master. So long as there is only one node in the
                  replication group, that one node acts behaves as if it is a
                  non-replicated application. There are some differences in the
                  format of the log file that the application maintains, but it
                  otherwise behaves identically to a non-replicated transactional
                  application.
              </p>
        <p>
                  Subsequently, upon startup a new node must be given the contact
                  information for at least one currently active node in the
                  replication group in order for it to be added to the
                  group.  The new node contacts this active node
                  who will identify the Master for the new node.
              </p>
        <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
          <h3 class="title">Note</h3>
          <p>
                      As is the case with elections, a node cannot be added
                      to the replication group unless a simple majority of
                      nodes are active at the time that it starts up. If
                      too many nodes are down or otherwise unavailable, you
                      cannot add a new node to the group.
                  </p>
        </div>
        <p>
                  The new node then contacts the Master, and provides all
                  necessary identification information about itself to the Master.
                  This includes host and port information, the node's unique
                  name, and the replication group name. The Master stores this
                  identifying information about the node. Because this
                  information is stored persistently, the effective size of the
                  replication group has just grown by one.
              </p>
        <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
          <h3 class="title">Note</h3>
          <p>
                      Note that the node is now a permanent member of the
                      replication group until you manually remove it. This
                      is true even if you shutdown the node for a long
                      time. See <a class="xref" href="utilities.html#node-addremove" title="Adding and Removing Nodes from the Group">Adding and Removing Nodes from the Group</a> 
                      for details.
                  </p>
        </div>
        <p>
                  Once the new node is an established member of the group, the
                  Master provides the Replica with the logical logs needed to
                  replicate the environment. The sequence of logical log
                  records sent from the Master to the Replica constitutes the
                  <span class="emphasis"><em>Replication Stream</em></span>. At this time, the
                  node is said to have <span class="emphasis"><em>joined</em></span> the group. 
                  Once a replication stream is established, it is maintained until either the
                  Replica or the Master goes down. 
              </p>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="lifecycle-established"></a>Subsequent Startups</h3>
            </div>
          </div>
        </div>
        <p>
                  Each node stores information about
                  other replication group members in its replicated
                  environment so that this information is available to it
                  upon restart.
              </p>
        <p>
                  When a node that is already an established
                  member of a replication group is restarted, the node uses its
                  knowledge of other members of the replication group to
                  locate the Master. It does this by by querying the
                  members of the group to locate the current Master. If it
                  finds a Master, the node joins the group and proceeds to participate in the
                  group as a Replica.
              </p>
        <p>
                  If a Master is not available, the restarting node
                  initiates an election so as to establish one. If a
                  simple majority of nodes are available for the election,
                  a Master is elected. If the restarting node is elected
                  Master, it then waits for Replicas to connect to it so
                  that it can supply them a replication stream.
              </p>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="lifecycle-nodestartup"></a>Replica Startup</h3>
            </div>
          </div>
        </div>
        <p>
                  Regardless of how it happens, when a node joins
                  a replication group, it contacts the
                  Master and then goes through the following three steps:
              </p>
        <div class="orderedlist">
          <ol type="1">
            <li>
              <p>
                          Handshake
                      </p>
              <p>
                          The Replica sends the Master its configuration
                          information, along with the unique name
                          associated with the Replica's environment. This
                          name is a pseudo-randomly generated Universal
                          Unique Identifier (UUID). 
                      </p>
              <p>
                          This handshake establishes the node as a valid
                          member of the group. It is used both by new nodes
                          joining the group for the first time, and by
                          existing nodes that are simply restarting.
                      </p>
              <p>
                          In addition, during this handshake process, the
                          Master and Replica nodes will compare their
                          clocks. If the clocks are too far off from one
                          another, the handshake will fail and the Replica
                          node will fail to start up.  See
                          <a class="xref" href="timesync.html" title="Time Synchronization">Time Synchronization</a>
                          for more information.
                      </p>
            </li>
            <li>
              <p>
                          Replication Stream Sync-Up
                      </p>
              <p>
                          The Replica sends the Master its current position
                          in the replication stream sequence. The Master
                          and Replica then negotiate a point in the
                          replication stream that the Master can use as a
                          starting point to resume the flow of logical
                          records to the Replica.
                      </p>
              <p>
                          Note that normally this sync-up process will be
                          transparent to your application. However, in rare
                          cases the sync-up may require that committed
                          transactions be undone.
                      </p>
              <p>
                          Also, if the Replica has been offline for a long
                          time, it is possible that the Master can no
                          longer supply the Replica with the required contiguous
                          interval of the replication stream. (This can
                          happen due to log cleaning on the Master.) In
                          this case, the log files must be copied to the
                          restarting node from some other up-to-date node
                          in the replication group. See 
                          <a class="xref" href="logfile-restore.html" title="Restoring Log Files">Restoring Log Files</a>
                          for details.
                      </p>
            </li>
            <li>
              <p>
                          Steady state replication stream flow
                      </p>
              <p>
                          Once the Replica has successfully started up and
                          joined the group, the
                          Master maintains a flow of log records to the
                          Replica. Beyond that, the Master will request
                          acknowledgements from the Replica whenever the
                          Master needs to meet transaction commit
                          durability requirements.
                      </p>
            </li>
          </ol>
        </div>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="lifecycle-masterfailover"></a>Master Failover</h3>
            </div>
          </div>
        </div>
        <p>
                  A Master failing or shutting down causes all of the replication streams
                  between the Master and its various Replicas to terminate.
                  In reaction, the Replicas transition to the Unknown state
                  and initiate an election.
              </p>
        <p>
                  An election can be held if at least a simple majority of
                  the replication group's nodes are active. The node
                  that wins the election transitions to the Master state,
                  and all other active nodes transition to the Replica
                  state.
              </p>
        <p>
                  Upon transitioning to the Replica state, nodes connect to
                  the new Master and proceed through the handshake,
                  sync-up, replication replay process described in the
                  previous section.
              </p>
        <p>
                  If no Master can be elected (because a majority of nodes
                  are not available to participate in the election), then
                  the nodes remain in the Unknown state until such a time
                  as a Master can be elected. In this state, the nodes
                  might be able to service read-only requests, but the
                  replication group is incapable of servicing write
                  requests. Read requests can be serviced so long as the
                  transaction's consistency requirements can be met (see 
                  <a class="xref" href="consistency.html" title="Managing Consistency">Managing Consistency</a>). 
              </p>
        <p>
                  Note that the JE Replication application needs to make
                  provisions for the following state transitions after
                  failover:
              </p>
        <div class="itemizedlist">
          <ul type="disc">
            <li>
              <p>
                          A node that transitions from the Replica state to
                          the Master state as a result of a failover needs
                          to start accepting update requests. There are
                          several ways to determine whether a node can
                          handle update requests. See
                          <a class="xref" href="replicawrites.html" title="Managing Write Requests at a Replica">Managing Write Requests at a Replica</a>
                          for more information.
                      </p>
            </li>
            <li>
              <p>
                          If a node remains in the Replica state after a
                          failover, the failover should be transparent to
                          the application. However, an application may need
                          to take corrective action in the rare situation
                          where the sync-up process has to roll back
                          committed transactions.
                      </p>
              <p>
                          See <a class="xref" href="txnrollback.html" title="Managing Transaction Rollbacks">Managing Transaction Rollbacks</a> 
                          for an example of how handle a transaction commit roll back.
                      </p>
            </li>
          </ul>
        </div>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="twonode"></a>Two Node Groups</h3>
            </div>
          </div>
        </div>
        <p>
                  Replication groups comprised of just two nodes represents a
                  unique corner case for JE replication. In order to elect
                  a master, usually a simple majority of nodes must be available to
                  participate in an election. However, for replication groups
                  of size two, if even one node is unavailable for the election
                  then by default it is impossible to hold an election.
              </p>
        <p>
                  However, for some classes of application, it is desirable
                  for the application to proceed operations with just one node.
                  That is, the application trades off the durability guarantees
                  offered by using two nodes for the higher availability
                  permissible by allowing the application to run with just one
                  node.
              </p>
        <p>
                  JE allows you to do this by designating one of the nodes
                  in a two-node group as a <span class="emphasis"><em>primary node</em></span>.
                  The other node in the group is then, implicitly, the
                  secondary node. When the secondary node is not available, the
                  number of nodes required for a simply majority is reduced
                  from two to one by the primary node. Consequently, the
                  primary node is able to elect itself as the Master. It can
                  then commit transactions that require a simple majority to
                  acknowledge commits. When the secondary becomes available
                  again, the number of nodes required for a simple majority at
                  the primary once again reverts to two.
              </p>
        <p>
                  At any given time, there must be either zero or one
                  nodes designated as the primary node, but it is up to your
                  application to make sure both nodes are not erroneously
                  designated as the primary. Your application must be very
                  careful not to mistakenly designate two nodes as the primary.
                  If this happened, and the two nodes could not communicate
                  with one another (due to a network malfunction of some kind,
                  for example), they could both then consider themselves to be
                  Masters and start accepting write requests. This violates a
                  fundamental requirement that at any given instant in time,
                  there should be exactly one node that is permitted to perform
                  writes on the replicated environment.
              </p>
        <p>
                  Note that the secondary always needs two nodes for a simple
                  majority, so it can never become the Master in the absence of
                  the primary node. If the primary node fails, you can make
                  provisions to swap the primary and secondary designations so
                  that the surviving node is now the primary. This swap must be
                  performed carefully so as to ensure that both nodes are not
                  concurrently designated the primary. The most important thing
                  is that the failed node comes up as the secondary after it
                  has been repaired.
              </p>
        <p>
                  For more information on using two-node groups, see 
                  <a class="xref" href="two-node.html" title="Configuring Two-Node Groups">Configuring Two-Node Groups</a>.
              </p>
      </div>
    </div>
    <div class="navfooter">
      <hr />
      <table width="100%" summary="Navigation footer">
        <tr>
          <td width="40%" align="left"><a accesskey="p" href="datamanagement.html">Prev</a> </td>
          <td width="20%" align="center">
            <a accesskey="u" href="introduction.html">Up</a>
          </td>
          <td width="40%" align="right"> <a accesskey="n" href="progoverview.html">Next</a></td>
        </tr>
        <tr>
          <td width="40%" align="left" valign="top">Managing Data Guarantees </td>
          <td width="20%" align="center">
            <a accesskey="h" href="index.html">Home</a>
          </td>
          <td width="40%" align="right" valign="top"> Chapter 2. Replication API First Steps</td>
        </tr>
      </table>
    </div>
  </body>
</html>
